The Impact of Task and Corpus on Event Extraction Systems

نویسنده

  • Ralph Grishman
چکیده

The term “event extraction” covers a wide range of information extraction tasks, and methods developed and evaluated for one task may prove quite unsuitable for another. Understanding these task differences is essential to making broad progress in event extraction. We look back at the MUC and ACE tasks in terms of one characteristic, the breadth of the scenario – how wide a range of information is subsumed in a single extraction task. We examine how this affects strategies for collecting information and methods for semi-supervised training of new extractors. We also consider the heterogeneity of corpora – how varied the topics of documents in a corpus are. Extraction systems may be intended in principle for general news but are typically evaluated on topic-focused corpora, and this evaluation context may affect system design. As one case study, we examine the task of identifying physical attack events in news corpora, observing the effect on system performance of shifting from an attack-event-rich corpus to a more varied corpus and considering how the impact of this shift may be mitigated.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

The Event StoryLine Corpus: A New Benchmark for Causal and Temporal Relation Extraction

This paper reports on the Event StoryLine Corpus (ESC) v0.9, a new benchmark dataset for the temporal and causal relation detection. By developing this dataset, we also introduce a new task, the StoryLine Extraction from news data, which aims at extracting and classifying events relevant for stories, from across news documents spread in time and clustered around a single seminal event or topic....

متن کامل

Leveraging Dependency Regularization for Event Extraction

Event Extraction (EE) is a challenging Information Extraction task which aims to discover event triggers with specific types and their arguments. Most recent research on Event Extraction relies on pattern-based or feature-based approaches, trained on annotated corpora, to recognize combinations of event triggers, arguments, and other contextual information. These combinations may each appear in...

متن کامل

Event Role Labelling using a Neural Network Model (Étiquetage en rôles événementiels fondé sur l'utilisation d'un modèle neuronal) [in French]

Information Extraction systems must cope with two problems : they heavily depend on the considered domain but the cost of development for a domain-specific system is important. We propose a new solution for role labeling in the event-extraction task that relies on using unsupervised word representations (word embeddings) as word features. We automatically learn domain-relevant distributed repre...

متن کامل

Including New Patterns to Improve Event Extraction Systems

Event Extraction (EE) is a challenging Information Extraction task which aims to discover event triggers of specific types along with their arguments. Most recent research on Event Extraction relies on pattern-based or feature-based approaches, trained on annotated corpora, to recognize combinations of event triggers, arguments, and other contextual information. However, as the event instances ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010